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INTRODUCTION 


MANY types of anatomical characters have been employed to distinguish 
the higher taxonomic categories of mammals, At the lower levels, and 
particularly for infraspecific categories, only three criteria are gener- 
ally useful. One of them involves pelage color, which is not the concern 
of this paper. The second is absolute size. The third involves the pro- 
portions of various parts of the body, chiefly those of the skull, It is 
now usually accepted that variations in size and body proportions among 
vertebrates are at least in part genetically determined. Dice (1932 and 
subsequent papers), for example, has shown that the adult body size of 
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field-caught samples of numerous races of Peromyscus tends to be per- 
petuated in their laboratory-reared descendants. He has shown, further- 
more, that in some of these races there is a large amount of local gene- 
tic variation in the proportions of several parts of the body. 

Use in systematic zoology of proportions or ratios between certain 
body parts is so common that it would be difficult to cite a single exten- 
sive taxonomic paper dealing with subsepcific categories which does not 
employ them. In some instances only one or two proportional differences 
were used; for example, to distinguish between two races (Hooper, 1943) 
or to distinguish between domesticated dogs and coyotes (Howard, 1949). 
Proportions as taxonomic criteria have been shown to be valid only under 
certain circumstances, yet the use of simple ratios by vertebrate sys- 
tematists has persisted. ‘oJ 

It seems probable, in view of the amount of research that has gone 
into detecting the characters which distinguish geographic races of mam- 
mals, that no other useful morphological criteria, beside the three re- 
ferred to, are likely to be discovered, Taxonomists will, therefore, be 
forced to continue to employ size and proportional differences in future 
studies of minor variation. Because of the uncertainty about the reliabil- 
ity of proportional differences and the fact that nongenetic factors, such 
as nutrition, can affect adult size, further work along these lines may 
seem useless, There is a method of analysis available, however, which 
combines the size and proportional character differences into a single 
measure, at the same time eliminating much of the difficulty associated 
with either considered separately. The method is a simple form of co- 
variance analysis. This is an extension of analysis of variance to in- 
clude two or more variates considered simultaneously. Although covari- 
ance analysis has been available for a number of years, it is only slowly 
coming to be recognized by mammalogists as a uSeful taxonomic tool. 

There are several reasons why taxonomists have failed to employ 
covariance analysis, First, the number of specimens available to re- 
present a local population is usually small. This difficulty, of course, 
obtains with any sort of analysis. Second, the method does require more 
elaborate computation than those ordinarily in use, which involve calcula- 
tion of ratios, means, and variances only. Third, there has also been a 
lack of understanding of the principles involved, and considerable doubt 
of the validity of its application to museum material. Finally, many 
systematist believe that the age of a specimen is a necessary item of in- 
formation and, as is well known, age of wild mammals or other terres- 
trial vertebrates can be estimated only in terms of broad age groups. 

One of the purposes of this paper is to suggest how these difficulties 
may in many cases be overcome, As regards sample size, all collectors 
know that the number of adult specimens obtainable on a collecting trip 
is limited; sometimes the number is indeed small, In the temperate lati- 
tudes, at least, the catch of a field party will vary considerably in age 
composition and, hence, in size of individual, depending upon the season 
in which the collection is made. Throughout much of the time from 
spring to early winter, small mammals of all ages will be represented. 
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__ Juveniles and adolescents often comprise the majority of the total catch. 
Customarily collectors preserve only those specimens which are judged 


to be either subadult or fully adult, on the grounds that any comparisons 
must be made among groups of approximately the same age. Although 


it seems obvious that a sample primarily of juveniles ought not to be 


compared with another mostly of adults, it will be shown that some- 
times such comparisons can safely be made. It will also be shown that 
under certain conditions, which have been fulfilled by the present data, 
the best comparisons can be made among samples consisting of indivi- 
duals of all ages, 

For clear accounts of the principles involved in making comparisons 
by covariance analysis see Reeve (1940) and Snedecor (1946). Certain 
considerations not fully discussed by them are presented later in the 
present paper. The validity of the use of covariance analysis for com- 
paring measurements must be considered. Even though the applicabil- 
ity of the analysis when applied to the model for which it was intended 
may be accepted, it has to be determined whether any particular data 
give a sufficiently close approximation to the model, so that the con- 
clusions drawn are likely to be correct. Examination of this problem 
with respect to the data used here occupies a large portion of this report. 

Since 1900 a tremendous literature has developed in the general field 
of relative growth, of which this subject is a part. Papers dealing with 
the relationships between two or more attributes of an organism would 
doubtless fill many volumes, Over fifty years ago Donaldson (1898) 
and Donaldson and Shoemaker (1900) demonstrated in two species of 
frogs that the length of the hind limb bore a constant relationship to the 
length of the body and also that the three components of the hind limb 
were in constant ratio to one another over a considerable range of limb 
length. Crozier and Hecht (1913) and Hecht (1916) investigated the re- 
lationships between several external measurements of various fishes 
and demonstrated that these measurements were simple linear functions 
of the length of the fish. Donaldson (1924) made available a vast amount 
of mensuration data on the albino rat and the Norway rat, much of it col- 
lected from his earlier writings. A large part of his data is presented 
as graphs of the relations, usually curvilinear, between various attributes, 

Undoubtedly, the major stimulus for research in relative growth 
was provided by the work of Huxley, which he summarized in book form 
(1932). Huxley attempted to establish a general theory concerning 
growth. Briefly, he argued that growth is a process of multiplication 
and that the growth of one part of the body relative to another part, or 
relative to the remainder of the body, is capable of expression by the 
equation Y = bx‘, where Y designates, for example, the weight of the 
crusher claw of a crab, X the weight of the rest of its body, and b and k 
are constants that characterize the particular species and comparison 
in question. This is called the allometry equation. In addition, Huxley 
attempted to provide a theoretical interpretation for the constants, but 
in this, in the opinion of many, he was not notably successful. 
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The publication of Huxley’s book provoked a flood of critical papers. 
Davenport (1934) criticized the use of what he called mass statistics 
for studying the growth of organisms and*presented data on human mea- 
surements to prove that the averaging effect of employing them com- 
pletely obscures the growth patterns in particular individuals. Bernstein 
(1934) criticized the derivation of Huxley’s equation on mathematical 
grounds and showed that it is approximated by the autocatalytic growth 
curve for a limited range of values. Wilson (1934) gave sensible sum- 
mary of the problems and pitfalls in the application of mathematical 
equations to the description and explanation of growth. _ 

Feldstein and Hersh (1935a) compared the results of fitting the 
arithmetic values of X and Y in the equation Y = bX* with the method 
advocated by Huxley of fitting to log X and log Y in the logarithmic trans- 
formation log Y = log b + k log X. Naturally, the two methods give dif- 
ferent answers, but the authors were not able to decide which was the 
appropriate one for the purpose. Feldstein and Hersh (1935b) also pub- 
lished a method for machine computation of the constants. Lumer (1939) 
discussed the relationships between the two constants, a matter of some 
concern among biologists, since the value of b is partly determined by 
the value of k. Investigations ultimately became so diversified that Hux- 
ley and Tessier (1936) published’a note pleading for standardization of 
terminology and notation, a plea which has not been very successiul. 

Huxley (1932) summarizes extensive and diverse data which are 
fitted by the allometry equation. Two points should be noted in this con- 
nection. At times many data when plotted show what appears to me to 
be a curvilinear trend. Many authors have dealt with this problem by 
fitting two or more straight-line segments to the data. This procedure 
has evoked considerable comment and will be discussed later. Huxley 
apparently accepted these supposed breaks in the data, since he origin- 
ally believed that the exponent k would be constant, at least for consider- 
able periods of time, for any given species and pair of measurements. 
He soon (Dawes and Huxley, 1934) discovered a case in which it was not 
and this instance he recognized as a case of curvilinear regression. 

Until comparatively recently the bulk of the investigations has been 
concerned directly with growth itself and the question of how adequately 
the allometry equation indicated the basis for it. The only generally 
accepted conclusion is that the allometry equation can be regarded to 
apply descriptively, in a statistical sense, to very diverse purposes, 

It has, however, also been abundantly established that other types of 
equations at times supply adequate fit to data. The literature relative 
to growth per se has been well summarized by Reeve and Huxley (1945) . 
Lately there has been a tendency to employ these growth equations for 
descriptive purposes only and not be greatly concerned with the basic 

** law”’ of growth itself. 

The interest in this investigation is not in attempting to deduce the 
facts about individual growth from the type of curve that is fitted to the 
data, but rather in the employment of a statistical technique as a means 
of measuring the size and proportional differences between two given 
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_ populations of organisms, Many papers, particularly those by wildlife 
- managers, give extensive tables of measurements, often broken down 
into age classes, but these measurements are usually not suited to the 
_ present purpose. In some articles, however, such relations have been 
_ presented graphically as growth (of several dimensions) plotted against 
___ the known age (in days). For Peromyscus, for instance, Svihla (1934) 
gave graphs of the usual external measurements and weight against age 
in days for laboratory-reared individuals of one race of P. maniculatus. 
__. Dice and Bradley (1942) did the same for seven other races of the same 
species. McCabe and Blanchard (1950) presented extensive data rela- 
tive to age for three species, P. maniculatus, truei, and californicus. 
Taxonomic studies in which the relations considered are between 
two or more measurements other than weight and age have not been num- 
erous. In mammals Green (1933) studied differential growth in the 
crania of two species of Mus and their hybrids. Reeve (1940, 1942) in- 

vestigated the facial index in three genera of anteaters. Butler (1941) 

, used relative growth to compare skulls of two species of insectivores. 
Clark (1941) made an extensive report of correlations between various 
pairs of external and skeletal measurements in Peromyscus. Gould and 
Kreeger (1948) investigated variation in muskrat skuls, Gentile (1952) 
studied various dimensions of the skull of the Norway rat. Tanaka (1940, 
1942, 1944a, 1944b; Aoki and Tanaka, 1938) has published extensively on 
biostatistics, Although their methods and purposes were somewhat dif- 
ferent, the entire output of the biometrical school of Galton and his fol- 
lowers could be cited. 

Relative growth analysis has other general uses as suggested by Hux- 
ley (1932). Robb (1935) indicated that the facial index (facial length/ 
skull length) , when studied as a regression, exhibits no change in slope 
in horses from Hyracotherium to Equus and, furthermore, that the re- 
gression coefficient is essentially the same as that which obtains in the 
ontogeny of the modern horse. Reeve and Murray (1942) re-examined 
Robb’s data and came to different conclusions. Lumer (1940) applied 
the allometry equation to 28 modern breeds of dogs and to a few speci- 
mens of prehistoric ones and distinguished six ‘‘ allometric tribes, ”’ 
Hersh (1934) showed that certain pairs of dimensions in titanotheres had 
the same regression lines over a long phylogenetic series. Gray’s (1945) 
data indicate that the relation between heart weight and body weight in 
several families of rodents has approximately the same value. Martin 
(1949) used the regression method to study the effects of environmental 
influences on control of body form in fishes. Tryon (1951) attempted 
to distinguish ecologic from genetic variation in pocket gophers, Hilde- 
brand (1952) studied the variation in body proportions among the mem- 
bers of the family Canidae. 

Of all the zoological papers examined, only one, that of Reeve (1940) 
deals with tests of significance. His study presented a method of com- 
paring regression lines. Although Reeve’s technique is identical witha 
simple form of covariance analysis which antedates it (cf. Snedecor, 
1934) , he was apparently the first to call the attention of many biologists 
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to the availability of a suitable technique for testing the significance of 
differences between samples, when these are treated as regressions. 
In spite of the innumerable articles which have been concerned with 
some phase of the general problem, very few have dealt with the em- 
ployment of covariance analysis to mammalian taxonomic problems. 
None has even attempted to demonstrate that the analysis could be 
legitimately applied to comparison of measurements taken on small 
mammals. Because of such lack, it is perhaps no wonder that many 
workers, who might profit from the use of similar analyses in their own 
material, have been unwilling to accept the simple form of covariance 
analysis advocated here. 

This study is primarily concerned with an evaluation of the applic- 
ability of covariance analysis to the comparison of various measure- 
ments between two populations of mice. It is desirable to evaluate the 
‘applicability of the analysis with respect to specific data, since one 
body of data may be suitable for covariance analysis, whereas another, 
although superficially similar, may not be. The animals used and the 
procedures for collecting the data are described briefly. 
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MATERIALS 


The mice used in this study were with a few exceptions born and 
raised in the Laboratory of Vertebrate Biology. Exceptions are the 
animals which had been trapped in the field and kept in the laboratory. 
Two clearly defined subspecies of the species Peromyscus maniculatus 
were selected for investigation. They are P.m. bairdi, a relatively 
small, prairie-dwelling race, and P.m. gracilis, a larger, forest- 
inhabiting form. These particular races were chosen because they were 
already available in the colony maintained at the Laboratory and be- 
cause they were known to differ in certain anatomical characters. Dif- 
ference in size between the two races roughly approximates the total 
range of size within the species. 

The mice were reared in the laboratory according to the standard 
techniques described by Dice (1929, 1934, 1947). The age at which the 
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individual mice were killed for specimens was determined by a number 
of circumstances. Many of the mice of the two races were cross-mated 
to produce F, hybrids for subsequent studies. The fertility of crosses 
between the two races in the laboratory is low, and so has been the fer- 
tility of gracilis when mated inter se. As it was deemed unwise to inter- 
fere in any way with the production of hybrids, entire sibships of-both — 
races, if the sibships were small, were held for a considerable period 

of time, usually over one year, for use in breeding. In large sibships 
some individuals were selected haphazardly to be killed at younger ages. 

The original stock of bairdi was secured from several localities in 
and around Ann Arbor in 1946 and 1947 by Van T. Harris. It is known 
that the field-caught mice represent several local subpopulations. 
Forty-one of the field-caught mice produced one or more, usually sev- 
eral, laboratory-born offspring. Sincé a few of these offspring were 
fathered by unknown wild males, it is likely that the total number of 
mice from which the stock is descended is a little higher than 41. No 
attempt was made to trace the pedigrees further, for in a generation or 
two this number of mice would leave a hopelessly intertwined genealogy. 
Because the bairdi stock has been very productive in the laboratory, the 
laboratory population may be considered a good representative of the 
wild population living in the vicinity of Ann Arbor. 

To balance the preponderance of older mice in the sample, some 
specimens of the same stock of bairdi that had been reared by Francis 
C. Evans for other studies were used. This group included mice killed 
at ages ranging from 15 days to one year. All available specimens of 
mice in several of the younger age classes were measured. For these 
mice the external measurements were not available. Usually, a 20-day- 
old skeleton was sufficiently well ossified so that, if prepared carefully, 
it could be measured accurately. Most of the skeletons of 15-day-old 
mice, however, could not be measured. In addition, three bairdi sib- 
ships of forty or more mice each were produced and the mice killed at 
ages ranging from one month to one year. These mice were collected 
in order to secure some sibships in which specimens covering a wide 
range of age and size would be represented. Incidentally, the mice in 
these three sibships make up about one-third of the specimens of bairdi 
studied. 

The situation in the gracilis stock is different. The stock originated 
from a small number of animals captured by W. Frank Blair in 1940 in 
Alger County, Michigan. Nine of these mice became the ancesters of 
four to six generations of laboratory-bred descendants. In 1947 another 
group of gracilis, taken by Van T. Harris from the same general vi- 
cinity, was combined with the previous stock. Of these new mice, only 
three left progeny, each of the three by crossing with a member of the 
original stock. The three resulting sibships were composed, respec- 
tively, of 27, 45, and 5 mice. Each of the three newcomers contributed 
to the heredity of the stock thereafter. The majority of the gracilis that 
have been studied here were reared after the introduction of the three 
additional parents. Indeed, the largest sibship available in the study was 
the one which contained the 45 mice. 
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The stock of gracilis cannot be considered to adequately represent 
the wild population living at the locality from which its ancestors came. 
The laboratory-bred animals, however, do appear in general to be 
similar to wild gracilis and they are clearly different from bairdi. No 
specimens of young gracilis were available to supplement the series of 
older ones. Accordingly, about 40 young gracilis from ten sibships 
were prepared as specimens at ages which ranged from about 25 to 40 
days. The bones of a 20-day-old gracilis were too poorly ossified to 
be measured with accuracy. There may possibly be some bias in the 
gathering of the specimens of gracilis in that the young mice in general 
represent different sibships from the older ones. In view of the small 
number (12) of wild mice, however, that contributed to the heredity of 
the stock and the consequent inbreeding, any such bias is probably very 
small. 


METHODS OF STUDY 


Measurements 


The mice to be prepared for specimens were killed by etherization. 
Immediately after death the.external measurements were taken in the 
manner described by Dice (1932): 


Total length. From the tip of the nose to the end of the tail, not including the 
tail hairs. Measured on a special rule, the animal being suspended by the tail, by 
which weight of the body gently extends the animal. The reading is to the nearest 
millimeter. This measurement is not included in the tables. 

Body length. Derived by subtracting tail length from total length. 

Tail length. Distance from the upper base of the tail to the tip of the tail, not in- 
cluding the hairs. Measured by allowing the body to hang over the edge of the table, 
the tail being extended along the measuring rule. Recorded to the nearest milli- 
meter. 

Hind foot. Length of the left hind foot, from the heel to the end of the middle 
claw, the foot being held straight between the thumb and forefinger. Measured by 
vernier caliper to 0.1 millimeter. 


Ear from notch. Length of left ear, from notch at lower end of ear opening to 


the tip of the ear. Measured by special inside-outside vernier caliper reading to 0.1 
millimeter. 


All external measurements were taken and recorded in triplicate. 
Computations have been made on the sums of these triplicate measure- 
ments. This is equivalent to computations made on the means of the 
triplicate measurements expressed in units of 0.3333 mm for body 
length and tail length, and in units of 0.0333 mm for hind-foot length 
and ear length. In all summaries of the analyses, however, the results 
have been converted into millimeters. 

After the external measurements had been taken, the mice were 
prepared as flat skins and part skeletons. The distal long bones of the 
limbs and the bones of the feet were left in the skins and were not avail- 
able for study. The remaining part of the skeleton was dried and then 
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cleaned by dermestid beetles as advocated by Hooper (1950). The 
cleaned skeletons were then stored in glass vials and later measured. 
The measurements of the skull and skeleton are not all identical with 
those taken by Dice (1932). Those which are the same are quoted from 
the 1932 paper. All skull and skeletal measurements were made with a 
bench micrometer which reads to 0.01 mm and recorded to 0.01 mm. 
All were taken under a wide-angle dissecting microscope giving a mag- 
nification of 3.6 diameters. Five different measurements were taken. 

Length of femur.—Greatest length of the left femur from the con- 
dyles to the dorsal surface of the head. 

Length of humerus.—Greatest length of the left humerus from the 
condyles to the proximal end of the head. 

“Skull length.—From left condyle to most anterior part of left pre- 
maxilla. The point on the premaxilla is taken slightly below the nasal 
and is not in the median line of the skull.” 

“Mastoid breadth of skull.—Width of skull across mastoid pro- 
cesses.” 

Depth of skull.—Distance from the palatal shelf between the third 
molars to the top of the skull, measured perpendicular to the palatal 
shelf. 

In any case where the measurement was supposed to be taken on 
the left side, and this was not possible because of breakage, the mea- 
surement was made on the right side. No distinction in the computa- 
tions was made of this difference, but the information was recorded. 

In the doctoral dissertation on which the present paper is based 
(McIntosh, 1954), the several measurements and age of the mice are 
given in the appendix. 

Preliminary studies indicated that skeletal measurements could be 
replicated with an average discrepance which varied from less than 
0.01 mm for the lengths of the humerus and femur to a maximum of 
about 0.03 mm for skull depth. Errors of measurement can play a very 
important part in covariance analysis. One of the variates must be 
designated as the independent variate. This is the variate that will in 
effect be used as a standard against which to compare other measure- 
ments. In reality, there is no such thing as a true independent variate 
in the case where two measurements of an organism are being com- 
pared. There has to be an element of causality involved with a true in- 
dependent variate, such as observed in the case of an experiment to test 
the relative ability of several fertilizers to stimulate plant growth. 
Here some degree of causality may reasonably be ascribed to the pres- 
ence of the fertilizer. But one cannot readily think of a femur, for ex- 
ample, causing a particular humerus length to be associated with it. 
However, because of the common usage of the term independent variate 
in the literature in connection with problems like the present one, its 
use is retained here. 

The conventional bivariate normal regression theory requires that 
there be no errors in the independent variate. The nature of these 
errors, which will in fact be present in these data, will be discussed 
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later. At this point it suffices to say that errors of measurement in the 
independent variate will bias the results, if they occur, even though the 
errors are random. : 

For reasons to be discussed later, femur length was selected as 
the independent variate, or standard of comparison, for the skeletal 
measurements. It had a very low variability among the replicated 
measurements, compared to the total variation among the different 
femurs, and hence did not introduce any appreciable measurement bias 
into the results. Errors of measurement in the other dimensions, the 
dependent variates, will not introduce a bias into the analysis, if they 
are random and uncorrelated with the true component of the measure- 
ment. They will merely reduce somewhat the precision of the analysis. 
For these reasons it did not seem necessary to replicate the skeletal 
measurements and to base the computations on their mean values. In- 
stead, a convention used by Dice (personal communication) to aid in avoid- 
ing errors, which at the same time gives something of a mean value, was 
adopted. Two measurements were taken. If the second differed from 
the first by one recording unit (here 0.01 mm), the first was recorded. 
If the two measurements differed by two units, the mean was recorded. 
If they differed by three or more units, additional measurements were 
taken until one or the other of the previous criteria could be satisfied. 
Only in the case of skull depth was it at all common for three or more 
measurements to be required. It almost never happened in measure- 
ments of the humerus or femur. Note that the recording unit used here 
for the skeletal measurements represents a value of one-tenth the unit 
usually used by mammalogists for these purposes. It does not seem 
likely that errors of measurement, still inevitably present, appreciably 
vitiated the results. 

It has previously been said that the external measurements are 
actually the mean of three determinations. Variation in the three repli- 
cates was usually very low, indicating that the numerical values used in 
the computations lie close to the true means. For measurements of body 
length and tail length, the effective unit used is one-third that commonly 
employed, in addition to being a mean value. For measurements of hind- 
foot length and ear length, the effective unit is one-thirtieth of the con- 
ventional unit of one millimeter, although some workers do measure 
foot length and ear length to the nearest tenth of a millimeter. 

Variation among replicates taken several days or longer apart was 
sometimes greater than variation among replicates made at the same 
time. Superficial study indicated that these changes did not mark any 
trend, such as might result from a shrinkage of the bones as they dried. 
This is fortunate, since it is almost impossible to measure all bones at 
the same interval after death of the mouse. Donaldson (1924, Table 132) 
shows that the maximum shrinkage in any bones from freshly killed 
albino rats, until the bones were air-dried, amounted to 0.7 per cent of 
the length. Oven drying produced an additional 0.3 per cent shrinkage 
at most. In the present study the bones were thoroughly air-dried be- 
fore they were measured. In addition, individuals were not measured 
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in any systematic manner, as for example all males first and then all 
females, so it is presumed that any differences that exist, because the 
bones were not measured on the same day, are essentially independent © 
of any attribute or grouping studied. 

Further study of the phenomenon of periodic variability in measure- 
ments would be desirable, for it is entirely possible that the variation is 
solely a human error. Even with the aid of a low-power microscope, a 
certain amount of judgment is required to ascertain when the points of 
the micrometer are just touching the ends of a bone. The criterion 
could reasonably be expected to vary somewhat from day to day. It 
would always be wise to measure the material in some essentially ran- 
dom fashion, as done here, to avoid having changes in criteria possibly 
correlate with any grouping that might subsequently be used in the 
analysis. 

The ages of all but the field-caught mice are known to within a 
week, that of the majority probably to within two or three days. But the 
age of the mice is not an explicit part of the analysis and, as will be 
shown later, failure to make use of age data will not introduce any bias 
into the results. 


Comparisons 


The comparisons of the measurements that have been made were 
selected on several grounds and it is realized that the selection was 
arbitrary. The conventional external measurements were taken because 
they must be made at the time of death of the specimen. They are then 
available for study at any time at the discretion of the investigator. 
Body length seemed the best choice for the independent variate for com- 
parisons involving external measurements. 

There is literally an indefinite number of skull and skeletal meas- 
urements which can be taken. Femur length was selected for the inde- 
pendent variate because (1) Dice (1932) showed it to be less variable 
bilaterally than either the humerus or the innominate bone; (2) it seemed 
desirable to use as a standard of comparison a dimension that was not 
a composite of many bones; and (3), most important perhaps, the absolute 
change in length of the femur, over the age range studied, was greater 
than that of any other measurement of the skull or skeleton. This isa 
very desirable attribute for the independent variate to possess. 

Of the dependent variates, the humerus was selected in order to 
give a comparison between two measurements on the postcranial skele- 
ton, a practice almost completely neglected by mammalian systematists, 
and in addition, because the difference in arboreal habits between the 
two races might be expected to show up here to some extent. Skull 
length was chosen because it seemed obvious that the two races differed 
in the relative length of the skull. It was of interest to determine whe- 
ther this relative difference was, in fact, due solely to the difference in 
size between the two races. Skull width and skull depth were selected 
because they encompass the other directions in which skulls could vary 
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grossly and also because upon inspection of a series, it was not appar- 
ent that the two races differed in any way other than in actual size. 

The measurements of the laboratory populations of Peromyscus 
were compared in three categories: between sexes, between races, and 
among sibships. The first comparisons necessary were to determine 
whether the sexes differ significantly in measurements. It was sufficient 
for this to compare female bairdi with male bairdi, and female gracilis 
with male gracilis. If the sexes differ, other comparisons must then be 
made separately for each sex. Of fourteen comparisons made between 
sexes within races, ten indicated differences at least at the 5 per cent 
level. Only one comparison between the sexes in bairdi and three be- 
tween the sexes in gracilis did not-have any significant differences (see 
Table VI). Furthermore, each of the seven pairs of comparisons indi- 
cated a sexual difference in at least one of the two races. Thus, at the 
outset it was shown to be necessary in further analyses to avoid com- 
bining the sexes. 

The second type of variation considered was that between the two 
races. This was the major reason for undertaking the study. The third 
type of variation that could be evaluated here to some extent was that 
among the several sibships of each race. Since a sizable proportion of 
the mice available were members of small sibships, some containing 
only one specimen, an analysis of the entire data from this viewpoint was 
not attempted. Only the larger sibships of each race have been selected 
for this kind of comparison. 

In bairdi there were six large sibships. The number of specimens 
for each sibship-sex group ranged from 13 to 29 for comparisons of 
skeletal measurements. The number available for comparisons of ex- 
ternal measurements was fewer in most cases. Actual figures for each 
group are given in Tables II through V, along with the summary of the 
analyses. 

In gracilis no adequate material for comparing sibships was avail- 
able, for there was only one large sibship and one of medium size. 
Hence, any sibship-sex group that numbered five or more individuals 
had to be included. This gave a total of five groups for each compari- 
son, except for the external measurements of the female gracilis, for 
which only four groups were available. 


ANALYSIS OF DATA 


Covariance analysis is discussed in most statistics texts. Here the 
notation used by Snedecor (1946) is employed because Snedecor’s text is 
frequently consulted by biologists. The alternative terminology of Reeve 
(1940) is indicated. Covariance analysis is applicable to two or more 
variates occurring in two or more groups. The simplest form has 
chiefly been used, that of only two variates in only two groups. In com- 
parisons among sibships, however, two variates in from four to six 
groups have been compared. 
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The general procedure followed in the analysis will be given briefly 
with references to the actual data. The analysis centers about the sam- 
ple regression coefficient. The regression coefficient, symbolized by 
by x or simply by b, is a measure of the average rate at which the 
mean numerical value of the dependent variate, Y, changes with re- 
spect to a change in the numerical value of the independent variate, X, 
ina sample. The regression coefficient is thus a measure of the slope 
of the regression line of Y on X. The regression coefficient of X on 
Ve Dey: is not in general equal to the regression coefficient of Y on © 
and is not an explicit part of covariance analysis. 

In any comparison the samples are first tested to ascertain whether 
the populations from which they were drawn are likely to have the same 
regression coefficients. If the answer is negative, then the samples may 
be considered to represent different populations and the required infor- 
mation has been obtained. Upon comparing the regression coefficient of 
humerus length on femur length for bairdi females with that for bairdi 
males, for example, such a difference was detected (see Table I). The 


coefficients in this case are 0.636588 mm/mm for the males, and 


0.596021 mm/mm for the females. The difference is 0.040567 mm/mm, 
which is significant at the 1 per cent level. The regression lines for 
these two samples are drawn, with others, in Figure 7. The range along 
the X-axis is the range covered by the samples. 

In some cases it will be found that, so far as can be ascertained, 
both samples could have been drawn from populations which had the 
same regression coefficient. The samples, nevertheless, can still ob- 
viously be different, since their regression lines may be located ata 
considerable distance from each other. In comparison of the regres- 
sion line for humerus length on femur length for bairdi females with 
the same regression line for gracilis females, for example, this was 
the situation (see Figure 7). The actual regression coefficients were 
0.596021 mm/mm for the bairdi females, and 0.599192 mm/mm for the 
gracilis females. The difference is 0.003171 mm/mm (Table I). The 
test of significance indicated that this difference was well within the 
range of expected sampling fluctuation, but a reference to the figure 
will convince most persons that the two samples differ somewhat in 
their location on the graph. 

Since the two regression coefficients are not significantly different 
in this example, they may be averaged in order to obtain a more pre- 
cise estimate of the coefficient to apply to both samples. This average 
is not obtained directly however. Reeve (194U) or Snedecor (1946) may 
be consulted for details of the computational procedure. If the two re- 
gression lines are redrawn so that both have the same slope, but so that 
each line still passes through the mean of its sample, the resulting 
parallel lines will show graphically how far apart the two samples are. 
In this particular example the pair of parallel lines would be indistin- 
guishable from the individual regression lines (Figure 7). This, how- 
ever, is not always irue. 
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The vertical distance between the two regression lines is easily 
computed. A test of significance may be applied to the distance in order 
to determine whether it is greater than an amount which might reason- 
ably be ascribed to sampling fluctuation. In the present example, the 
distance between the regression lines is 0.2940 mm. This is greater 
than may be expected due to chance at the 1 per cent level of signifi- 
cance. It must be concluded, therefore, that mean humerus length of 
bairdi females differs from mean humerus length of gracilis females, 
if the two are compared at the same femur length. Such differences are 
termed adjusted mean differences (Snedecor, 1946) or position differ- 
ences (Reeve, 1940). The term adjusted mean difference refers to the 
fact that the means of both dependent variates have been computed at 
the same point along the X-axis. Whenever the numerical values of the 
adjusted means are given, it is customary that they be computed for the 
mean value of the independent variate. 

Summaries of all analyses and comparisons and the significance of 
differences are presented in Tables I through V. Conventional use of 
one asterisk to indicate significance at the 5 per cent level and two 
asterisks to indicate significance at the 1 per cent level has been fol- 
lowed. 

For comparisons of adjusted means when only two samples are 
compared, the important information is the difference between adjusted 
means. Only this difference is given in the tables, since adjusted 
means, when bairdi and gracilis are being compared, must be computed 
at the same femur length or body length. The resulting adjusted means 
would have little meaning for either race. 

For comparison of sibships (Tables II-V) only one race was in- 
volved in any comparison. With more than two samples in a comparison 
the range of extreme means does not furnish all the information neces- 
sary for visualization. Hence, the actual adjusted means are given. The 
inclusion of young animals in the samples makes the computed means 
smaller than the means for adults only, which are more easily inter- 
preted by many. The adjusted means, therefore, were computed at the 
estimated adult means, in round numbers, of the independent variates. 
These are 14.8 mm for bairdi femur length, 17.2 mm for gracilis femur 
length, 85 mm for bairdi body length, and 92 mm for gracilis body 
length. 

For easy reference where less detail is required, Tables I through 
V are condensed into one (Table VI) which indicates whether any signifi- 
cant difference was found between or among the various groups com- 
pared. For this purpose the distinction between regression coefficient 
differences and adjusted mean differences is ignored and, as a further 
simplification, the 5 per cent level is used as the sole criterion of sig- 
nificance. 

Before further discussion of these findings, and consideration of 
their application to biological problems, it will be wise to review the 
statistical assumptions inherent in covariance analysis and to inquire, 


insofar as possible, how well these assumptions have been fulfilled by 
the present data. 
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TABLE II 


Regression Coefficients and Adjusted Means (if applicable) for 
Sibships of bairdi Females 


The upper statistic of each pair represents the estimated regression coefficient 
in mm/mm; the lower (in square brackets) the adjusted mean, computed for body 
length of 85 mm or femur length of 14.8 mm. The regression coefficient for the best- 
fitting set of parallel regression lines is given under Mean. The range of regression 
coefficients is given, with an indication of the level of significance, under Range. If no 
difference among regression coefficients was found, the range of the adjusted means is 
given, with an indication of the level of significance. Significance is interpreted to 
mean that at least two populations probably are represented in the samples. 


Humerus| Skull Skull Skull 
Length | Length Width Depth 


A 17 | .162715 094431 -609447 | .504036 | .196793 | .166889 


ae [17.775] [12.8696]1) cu sie [pb LOevO14d ie. oe 
B | 8-9] .113587 | .060929 .614456 | .853085 | .199858 | .197447 
pecan [18.406] [1.8672 || ecntee ols 10.5013] aos ee 
C | 14 | .384259 | .054861]. 20-21] .613336 | .721739 | .173508 | .302260 
poets [17.953] [22.0413], Rese 90: 105 7] ae 
D |18-19] .626006 | .054047 .518714 | .674081 | .223340 | .142066 
ear iate [18.474] [4127807] | Ss S534 eee 
E | 24 | .781576 | .059889 608580 | .855449 | .242616 | .126733 
SK [18.083] [iy 724 R) eeecee nl LOT 208 hie eee 
F | 17 | .601426 | .041711 .561190 | .792326 | .190267 | .103024 
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TABLE I 


Regression Coefficients and Adjusted Means (if applicable) for 
Sibships of bairdi Males 


The adjusted means are computed for body length of 85 mm or femur length of 
14.8 mm (for other explanation, see Table IL). 


.060625 
[17.690] 


.293795 
[14.672] |} | (11.9649]] ...... [10.8313] 


.148808 
[6.8535] 


B .094350 | .231638 ||13-14] .644514 117489 | .196153 
[18.763] | [14.443] [1129426] (ee [10.6667] [6.5110] 

Cc .030200 | .107983 F .244964 | .241980 
[18.366] | [14.729] [12.1737]} ...... | [10.7695 ]] [6.6034] 

D .077238 | .059883 || 22” | .701489 131570 | .138845 
[18.610] | [14.150] [ea -9964))| be [10.7866]| [6.9436] 

E .051081 | .080681 590161 .241034 | .098012 
[18.075] | [14.366] Hid B230l len eee {10.7657]] [6.6886] 

F .037962 | .061154 .624085 .206175 | .142882 
[17.924] | [13.985] 120981) 285. 2. [10.8800]| [6.7743] 

.047301 | .067954 636097 212131 | .143587 

eases [18.146] | [14.354] {12.0071]| ...... | [10.7735]] [6.7286] 

Range .064150 | .187809 .201378 [.176306]} .143968 
Perce [1.073**]| [.744**] [eabO7t4 ll eeememle2139%)e1 43264%)| 


TABLE IV 


Regression Coefficients and Adjusted Means (if applicable) for 
Sibships of gracilis Females 


The adjusted means are computed for body length of 92 mm or femur length of 
17.2 mm (for other explanation, see Table II). 


Hind- | Ear ||,, iio Skull skull | Skull 


foot R d 

ie | Length Length Length | Width | Depth 
242711 Beal 8| .743296 | 1.068004 | .033927| .258501 
eeeteh sees [19.808] [13.4184] | [26.0263] | [11.0952] | [6.6206] 


et) | ee es 5 |-.071090 |-1.933649 | -.446682 | -.657582 
[13.8003] | [26.9091] | [11.1555] | [6.7259] 


D | 19 | .591756 | .045707 -045947 21 | .533439 | .662250 .120700 | .130888 


Lee as nee. Sucks [19.608] [13.6097] | [26.4323]] [11.2598] | [6.6613] 
E 7 |2.839835 | .183428 | .099857 9| .436920 | .629943 | .312119] .079600 
PETE | nna toes [18.928] [13.7817] | [26.2523] | [11.2982] | [6.7428] 
F 5 |2.105263 | .220647 | .023481 5 |1.953781 |2.148859 | -.313925 | -.338535 
ie Re [18.495] [13.4520] | [25.2877] | [11.1772] | [6.9388] 
1 

= 
Mean | .749673 | .087844 | .070414 || .. | .536873 | .692760 | .146570| .131209 
Se Gian [19.384] [13.6444] | [26.3981] |[11.2216]| [6.6740] 
Range |2.248079*| .197004*4 .180489 || .. | 2.024871 |4.082508 | .758801 .916083 
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TABLE V 


Regression Coefficients and Adjusted means (if applicable) for 
Sibships of gracilis Males 


The adjusted means are computed for body length of 92 mm or femur length of 
17.2 mm (for other explanation, see Table II). > 


.111367 | .023205 -.261609 

{89.968] | [21.012] |[19.384] [13.8984]| [26.2568] [11.4391] | [6.9723] 
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326747 
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TABLE VI 
Summary of Significant Differences in the Several Comparisons 


The distinction between regression coefficient differences and adjusted mean 
differences is not maintained. The 5 per cent level of significance is used through- 
out. Minus sign (-) indicates no significant difference; asterisk (*) significant at 
least at 5 per cent level. 
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STATISTICAL CONSIDERATIONS 


Linearity and Logarithms 


Many investigators have followed the lead of Huxley (1932) in as- 
suming that the relations between two variates should be expressed by 
the equation Y = bX‘. The usual method of fitting this equation to a 
body of data has been to fit to the logarithmic transformation log Y = 
log b + k log X, which is in linear form. I have not seen a formal test 
for linearity applied to any published data of this type, but there have 
been several examples in the literature in which the data were pre- 
sented as scatter diagrams. In some a very good approximation to 
linearity appears to exist. In others the logarithmically transformed 
data still seems obviously curvilinear. Frequently, under such circum- 
stances the investigator has been content to divide the data into two or 
more groups and fit a different straight line segment to each group. 

Since there is no generally accepted theoretical formulation for 
growth, there seems to be no a priori reason to accept any particular 
system of transformation to apply to the variates. One purpose of 2 
transformation is to produce a pair of linearly related variates. Hence, 
the decision of which transformation to apply, if any, should stem from 
the nature of the data themselves. If the data are already linear, there 
is no reason for applying any transformation (but see Homogeneity of 
Error Variances, following). If the relation between two variates is 
correctly described by the equation Y = a+ bX, then the transforma- 
tion of Y to log Y and X to log X will result in a nonlinear relation 
except where the constant a is zero. This condition appears to apply to 
the data presented here. If the value of a is small relative to the mini- 
mum values of & and Y, the curvilinearity will be slight in logarithmic 
form and with most biological data the difference would not likely be ap- 
preciable enough to be detected. In other words, a formal linearity test 
would probably indicate that both the original variates and the logarithms 
of the variates were linearly related. Where a is large relative to the 
minimum values of X and Y, curvilinearity appreciable enough to be 
detected by the eye may result from a logarithmic transformation. 

It might be said that for most data there are usually several curves 
available which would give equally good fits, in so far as one could dis- 
tinguish between them. This is particularly true if the range over 
which the independent variate is distributed is small. In such cases the 
effect on significance tests of the use of different curves would be slight 
and would not usually alter the conclusions which are drawn from the 
data. The use of an unjustified transformation may serve to explain 
some of the many cases reported in which the investigator found that a 
“break” occurred in the regression line. Others viewing the same data 
might consider the relation curvilinear rather than consisting of two or 
more separate linear portions. These supposed breaks, in mammals at 
least, have usually been located at the age of onset of sexual maturity. 
But this event does not occur suddenly and it does not take place over 
the same size range for every individual. The location of any break, 
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therefore, would appear to be somewhat subjective. Changes in the 
position at which the investigator locates a break will change the re- 
gression coefficients for the separate regression lines above and below 
the break and may affect the conclusions drawn from the data. 

The preceding discussion may apply to the work of Tanaka (1942). 
The basic data on which his extensive analysis of the variation in ex- 
ternal and skull measurements was based are presented in grouped 
form. Tanaka invariably used logarithms of both variates in his analy- 
sis. I have plotted in arithmetic units from his tables most of the com- 
parisons he has studied. With the exception of tail length against body 
length, all his data appear to be adequately described by the equation 
Y =a+bX, where X and Y are in the original arithmetic units of 
measurement. Tanaka did not think it necessary to use two line seg- 
ments in all comparisons. However, in some cases where he did not 
use two regression lines, the data in logarithmic form seem to be just 
as nonlinear as in the cases where he did use two lines. It is admitted- 
ly difficult to judge the data when only the array means are presented, 
and no precise test for departure from linearity can be made, but I be- 
lieve that in most of the comparisons studied by Tanaka regressions 
computed from the original data would be linear. 

Another example of tail length may be taken from Svihla (1934). 
Svihla studied the growth of-Peromyscus maniculatus artemisiae from 
birth to 60 days of age, at which time the mice were nearly full grown. 
His data are presented as graphs of total length and tail length against 
age in days. The curves are drawn to connect the mean values for each 
age (two-day intervals) rather than smoothed. The original fine divi- 
sions of the co-ordinate paper are reproduced, so that mean total length 
and tail length can be read from the graphs to within at least one milli- 
meter for each age. From these it is easy to construct data for deter- 
mining the regression of array means of tail length on body length. The 
trend of the means obtained is somewhat erratic, it is suggestive of a 
sigmoid curve over the middle part of the regression, but the general 
trend is linear. The number of mice measured at each age varied from 
14 to 63. Svihla does not state how the difference in number was related 
to the age of the mice. In the absence of this and other information, it 
is useless to speculate whether the apparent deviations are likely to be 
real. The important point is that the general trend appears to be linear. 
The mean length of the tail at birth is given by Svihla as 11.30 mm; at 
age 60 days the mean value was 70.83 mm. The mean value of tail 
length for the thirteen parents of the young mice measured was 82.38 
mm. Svihla’s data thus cover approximately the middle three-fourths 
of the expected range of tail length from zero to final size. The mean 
body length at birth was given as 34.27 mm; at 60 days it was 82.14 mm. 
Mean body length for the parents was 89.23 mm. The change in body 
length observed was thus about 54 per cent of the range from zero to 
expected final size. Although the body averages rather large at birth 
relative to the tail, the observed growth in body length covered the per- 
iod in which the tail accomplished the majority of its growth. It would 


No. 72 COVARIANCE ANALYSIS 21 


seem likely that any consistent curvilinearity could be detected in these 
data if any did exist. 


Test for Linearity 


A requirement of the significance tests employed here is that the 
regressions must be linear. If a linear regression were fitted to curvi- 
linear data, the regression coefficient obtained would be a function of 
the distribution of the independent variate in the sample. The conclu- 
sions drawn from the data would thus obviously be affected. A formal 
test for departure from linearity is available. Actually, there are two 
slightly different tests, each applicable to somewhat different situations. 
In general, only one is presented in any standard text on statistics likely 
to be consulted by biologists. One of the tests (Snedecor, 1946, sec. 
14.4) is applicable to the common experimental situation in which the 
grouping of the independent variate is inherent in the data. If, for ex- 
ample, the heights of experimental plants are measured at intervals of 
two, four, six, eight, and ten weeks, and no intermediate values of time 
occur, no additional grouping of the data by age of plants is needed, 
where time is used as the independent variate. The other test (Goulden, 
1939, Chap. 12; Kenney and Keeping, 1951, sec. 11.2) applies to situa- 
tions in which the independent variate is more or less continuously dis- 
tributed in the sample and where, as a result, artificial grouping must 
be performed on the data after they have been collected. This situation, 
of course, applies to the data under consideration in this study. 

The computational method given by Goulden is easy to follow. It 
avoids the direct use of the means of the arrays in the computation, a 
practice that can lead to false conclusions unless a sufficient number of 
decimal places are carried. For the present data inspection of the scat- 
ter diagrams indicated that all regressions were almost certainly linear. 
The three that appeared most likely to show departures from linearity 
were formally tested. None showed significant departure from linear- 
ity. The results of the tests are given in Table VII and the associated 
graphs, which show how the array means deviate from their regression 
lines, in Figure 1. Scatter diagrams for two regressions, skull length 
on femur length for gracilis females and skull width on femur length for 
bairdi females, are given in Figures 2 and 3. 

rom Table VII it can be seen that the regressions of the means 
are clearly significant and that they account for the majority of the total 
variation. This is particularly true for the regressions of skull length 
and humerus length on femur length, because the slope of the lines (re- 
gression coefficients) is high. In no case does the deviation of the ar- 
ray means from the appropriate regression line account for an appre- 
ciable amount of the variation. This lack of significance indicates that 
the three regressions may be considered linear. After these three 
tests had been made and no departures from linearity detected, it did 
not seem necessary to make formal tests of the remaining regressions. 
It would thus appear that any conclusions drawn from the regressions 
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FIG. 1. Regression lines for array means. Deviations of array 
means from their regression lines are not significant in these examples. 


computed for the present data will not be vitiated by lack of linearity. 
The bivariate normal model. can here be applied directly to the original 


data without the need of any transformation. 


TABLE VII 


; . Sum of 
Source of Variation Squares 


Group and 
Regression 


bairdi, Regression of Means 21.1174 21.1174 427** 
Skull Width on Deviation of Means -5062 .0844 1.70 
Femur Length Error 8.5878 -0496 

‘ gracilis, ror Regression of Means | 163.3494 163.3494 | 1595** 
Humerus Length Deviation of Means .3250 -1083 1.06 
on Femur Length | Error 9.3153 .1024 : 

gracilis, 9 Regression of Means | 275.7295 275.7295 875** 
Skull Length on Deviation of Means 1.4303 .3576 1.13 
Femur Length Error 33.0881 .3151 


Homogeneity of Error Variances 


One of the requirements of the bivariate normal model is that the 


error variance measured from the regression line be constant at all 


points along the line. In particular, if this requirement does not obtain, 
the tests of significance will be affected. Data satisfying this criterion 
are said to be homoscedastic. An alternative situation that may reason- 


ably be expected is to find that the error variance increases as the 
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independent variate increases. Biologists, especially, may expect that 
this would be the case, since it is well known that the variance of a 
sample tends to increase as the mean increases. The use of the co-. 
efficient of variation reflects this common property of biological data. 
It does not necessarily follow, however, that the error variance about a 
regression line must increase as X increases. If the error variance. 
does increase as X increases, logarithmic transformation of the data 
will often make the variance homogeneous. 

a An approximate test can be made to determine whether the erro 
variance does change at different points along a regression line. If the 
data are divided into several groups along the X-axis, a series of sub- 
samples is available. To each subsample a regression line may be 
fitted and the error variances about the regression lines computed for 
the several subsamples. These error variances are subject to sam- 
pling error. Bartlett’s test for homogeneity of variances is available for 
ascertaining whether the differences among the several variances may 
reasonably be ascribed solely to sampling fluctuation. The procedure 
for making Bartlett’s test is here taken from Snedecor (1946, sec. 
10.13), with two slight modifications. In Snedecor’s Table 10.22, for 
s,” substitute Sy.x, the error variance about the regression line. For 
(k-1), the degrees of freedom, substitute (k-2), where k is the number 
of observations in each subsample. Snedecor’s example deals with 
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FIG. 2. Scatter diagram of skull length on femur length for gracilis females. 
Least squares regression line for the individual points has been drawn. Regression 
line for array means (Fig. 1) would be indistinguishable from the regression line for 
all points. Error variance about the regression line here appears to increase as 
femur length increases. In this respect the scatter diagram is not typical of the ma- 
jority of the data. A formal test indicates that the apparent increase of the error 


variance is not significant. 
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iances of univariate samples, and for each sample one degree of 
aie 2 lost by computation of the mean. For the modification used 
here, two degrees of freedom are lost, one for the mean of Y and one 

e regression coefficient. 

es ‘ai aoe regressions of skeletal measurements studied, only that of 
skull length on femur length for gracilis females appeared to be hetero- 
scedastic. Examination of Figure 2 will reveal that the variance about 
the regression line appears to increase as femur length increases. The 
test for homogeneity of error variances was applied to this regression. 
In addition, the regression of skull width on femur length for bairdi 
females was also tested, even though there did not appear to be any 
heteroscedasticity present (Figure 3). The results of the two tests are 
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FIG. 3. Scatter diagram of skull width on femur length for bairdi females. 
Least squares regression line for the individual points has been drawn. Regression 
line for the array means (Fig. 1) would be indistinguishable from the regression line 
for all points. Error variances here appear to be homogeneous, which a formal test 
confirmed. This scatter diagram may be considered typical of the majority of the 
data. 


summarized in Table VIII. Neither test indicates that heterogeneity of 
the error variances is greater than may easily be ascribed to chance 
alone. Therefore, for these data one may consider that the requirement 
of homoscedasticity is adequately satisfied. 


Use of Ratios 


The practice of computing and comparing simple ratios has been a 
very common one among systematists. Although Huxley (1932) and 
others have clea’ ly pointed out that this practice is not desirable, its 
use continued u'.abated. Snedecor (1946, sec. 6.14) discusses an 


4 
a 
fa 
a 


eka 


Ms bal 


No. 72 COVARIANCE ANALYSIS 25 


TABLE VII 


Application of Bartlett’s Test for Homogeneity of 
Error Variances about the Regression Line 


Error 
Variance 


Sy x) mm? 


Grouping of 
Independent 
Variate, mm 


Subsample 


Group and : 
Number 


Regression 


gracilis, 2 


Skull Length on 
Femur Length 


bairdi, 2 


Skull Width on 
Femur Length 


example. Use of simple ratios has in effect been a substitute for re- 
gression analysis. A ratio is generally computed as Y/X, and thus 
measures the magnitude of one mean relative to the other. However, 

if the variates may be related by Y =a + bX, a change of one unit in X 
causes a change in Y. The ratio &Y/ AX is constant over the entire 
range of X for which the equation is descriptive, and is equal to the re- 
gression coefficient, b. But the ratio Y/X varies as X changes, unless 
the constant a is zero. If a were known to be zero, the use of ratios 
would give considerable information. From a simple ratio, however, 
one cannot determine the value of a. 

It has been stated that if it is desired to compare a given ratio com- 
puted for two different samples, it can be done if X, = xe because then 
total size differences will not affect the ratios. This is true to the ex- 
tent that if the ratios differ, then the regression lines for the two sam- 
ples will also differ. However, the ratios could be identical and yet it 
may not be possible to distinguish between two samples which might 
have clearly different regressions. Figure 4 illustrates this condition. 
Although simple ratios are easy to compute, it would be wise to avoid 
their use wherever possible. 


1For = read: approximately equals. 


FIG. 4. Condition in which the ratio X/Y may 
be equal for two radically different regression 
lines. 


Effect of Age 


The mice available for this study are of known age, but museum 
workers must deal almost entirely with material for which the age can 
only be guessed. Although there are some general clues to age, such as 
tooth wear (taking into account the nature of the food eaten) and suture 
obliteration, they give only an approximation. The truth of this is re- 
flected in the usual custom of dividing material of an obviously exten- 
sive age range into only three or four age classes. To be able to apply 
quantitative methods in which age need not be considered as a separate 
variable would be extremely desirable. That covariance analysis is 
suitable for this purpose can be shown for the present data without re- 
course to actual ages. This situation is commonly recognized by inves- 
tigators who have previously employed similar methods. The systema- 
tists, however, who would profit most from the method, have clung with 
insistence to the practice of dividing available material into age classes 
and making comparisons between samples for each age class separately 
or, worse, they have ignored the younger age classes entirely. There 
apparently is a conviction that somehow age must be considered in order 
to avoid serious bias. 

The relative unimportance of age has been indicated before; it is 
implicit in all papers previously cited which made use of similar analy- 
ses. But it would seem that the facts cannot be emphasized too much, 
for I have never seen a formal proof applicable to the type of linear 
equation employed here. Obviously, the two measurements under con- 
sideration in any given comparison possess the attribute of growth; that 
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and upon differentiating (1) and (2) with respect to eine 


= 


7 g'(t), whence 


dy _ f(t) 
: dx” 3") Ye 
For any given point in time, 
f1(t) 
ait =K. If K is constant over some range of time, 


then (3) becomes for that range 


w = K, which upon integration gives - 


y = Kx +C. This is the equation for the linear regression of the means 
of the y arrays. 

The previous integration will hold only over the range of time in 
which K is constant. If it can be shown for any body of data that the 
desired regression is a good approximation to a linear one, that is, if a 
formal test fails to show that the data are nonlinear, it may be assumed 
that K was constant over some range of time. For that range, and only 
for that range, may comparisons based on linear regressions be made. 
Such comparisons will not be biased merely because age has been 
omitted as an explicit part of the analysis. It will be noted that time 
cancels out of consideration only if for every individual the measure- 
ments of X and Y are made at the same point in time. This is the 
customary way of taking measurements in investigations such as the 
present one. If one measured the humerus, for example, at one age and 
the femur at another age, then the time factor would be extremely im- 
portant. 

This is not to say that consideration of age may not increase the 
precision of the comparisons. Since in biological data there is usual! 
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an appreciable amount of variability that is not accounted for by any 
known hypotheses, any part of this that may be associated with some 
particular factor represents additional information. Specifically, in 
these data the variability of the several dependent variates has been 
associated, up to about 95 per cent in some cases, with the simple 
change in size of the independent variate. What can be done about the 
remaining variability? The answer seems to lie in the use of multiple 
regression, in which case age is used as an additional independent var- 
iate. Since it was possible to show significant differences between or 
among most of the larger groups tested, it did not seem necessary to 
apply multiple regression to these data. Reference is again made to 
the fact that the most common application of this method is expected to 
be to museum material, for which the ages of the specimens cannot be 
known. 


Errors in Independent Variate 


A summary of the notation to be used may be helpful at this point. 
Insofar as possible the basic notation follows that of Snedecor (1946). 
An upper case letter, for example X, is used to denote an original 
measurement. A lower case letter denotes a deviation from the mean 
of a series of numbers, x = (X - X). Any summation is indicated by S 
(the Greek letter 2 is often used for the purpose for which S is used 
here). The sum of the squares of the deviations from their mean, 
usually shortened to “sum of squares,” is denoted by S,?. The sample 
estimate of the population variance is denoted by s,?. The population 
variance is 0,2. 

Bivariate normal regression theory requires that the independent 
variate be free from errors. This is equally true for errors of mea- 
surement and for what, in the present instance, might be called errors 
of ontogenetic determination. The latter may be thought of as random 
changes which affect the independent variate without having any effect 
on the dependent variate. The relations may be illustrated by the fol- 
lowing scheme, which is essentially the model used by Kenney and Keep- 
ing (1951, sec. 8.11) to discuss the effect of errors in the independent 
variate. 


Here C represents the common cause, for example growth factors, that 
produces variation in n and £, and m isa constant that denotes the dif- 
ferential effect on 7 relative to ~. The quantity e is the error asso- 
ciated with , and d is the error contained in & Tukey (1951) has 
called and & the steady parts, and d and e the fluctuations. One may 
arbitrarily take n to be the steady part of the dependent variate, and é 
the steady part of the independent variate, so that 
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X= +d, and (4) 
Y=n+e. (5) 


The model further requires that d and e be normally and independently 
distributed with means zero and variances 04? and o,?, respectively. 
This means that the fluctuations or errors will be uncorrelated with the 
steady parts and with each other. In the adaptation of the model con- 
sidered, the determination of the steady parts by C is complete, or 


Phe = 1. (6) 


This is one of the cases considered by Tukey (1951) in which the “vari- 
ance component between degenerates along a line.” 

The next three equations follow directly from (4), (5), and the inde- 
pendence of the fluctuations. 


os = oy” + Car's (7) 
Oy” = Op? + Oe”, (8) 
Oxy = Ep. (9) 


In practice the values of the o?’s will be estimated from the sample 
variances. Henceforth, the sample estimates of the several variances 
will be employed. 

The regression coefficient, 8B, of 7 on € may be estimated by 


psa (10) 
Ss 
g 
B may also be estimated from the sums of squares, 
Ss 
p=axd. (10a) 
os 


The observed regression coefficient, b, of Y on X is 


see ne a (11) 
S:°+8q Sx 


From (10) and (11) it is seen that the expected value for the regression 
coefficient of Y on X is less than the expected value for the regression 


coefficient of 7 on é, or 


E[b] < E[p] (12) 


If a theory which specifies the magnitude of s,’ is available, the 
regression coefficient relating the steady parts may be determined 
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directly. Usually, however, as is the case in this study, there is no 
basis for knowing s,?. It must be estimated from the data. 

The amount by which £ differs from b is not only a function of $2’, 
but also involves s,2. The critical item is the ratio of the error vari- 
ances, which may be called lambda, following Kenney and Keeping 
(1951): 


ae (13) 


= 2 
Se 


A number of authors have investigated the problem of fitting a re- 
gression line where the independent variate is subject to error. Some 
of them make use of a third variate, called an instrumental variate. 
Tukey (1951) has given a comprehensive summary of the problem, Other 
authors, among whom may be mentioned Wald (1940) and Bartlett (1949) 
have provided a solution which does not require a third variate. Bart- 
lett’s method is an improvement over Wald’s, and has been used here 
to estimate the regression coefficient relating the steady parts. While 
the method was presented for the case in which the independent variate 
is evenly distributed along the X-axis, it should be approximately ap- 
plicable to cases in which this condition is not fulfilled. 

The procedure may be summarized briefly. The data are divided 
into three groups along the X-axis, the groups being as nearly equal in 
number as possible. The middle group is not considered. The regres- 
sion coefficient relating the steady parts is then computed as 


oe (14) 
- od 1 


The estimate obtained will be an unbiased estimate of the regression 
coefficient under two conditions: 


a. The presence of errors in the independent variate must not in- 
fluence the grouping, and 
b. The regression must be linear. 


It will be noted that the first condition could not be completely satisfied 
where the variates are essentially continuously distributed along the 
X-axis, as is the case with the data used here. However, with a large 
sample, the number allocated incorrectly would be a small proportion 
of the total and the bias consequently slight. Collection of the data over 
a considerable range of X will also aid materially in reducing the bias. 
In the application of Bartlett’s formula to the present data, it de- 
veloped that in a number of cases 8 was smaller than the least squares 
estimate of b, contrary to the expectation of (12). While this indicates 
that s,q* is small, it would be desirable to obtain several estimates of 
S,° and average them. In the event that the middle portion of the data 
had a slightly greater slope (regression coefficient) than did the ex- 
treme parts, it would be expected that b might be greater than g. Even 
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FIG. 5. Effect of slight irregularities on the expected value of b. 
Upper line: E[b] <E[g] Middle line: E[b] > E[g] 
Lower line: E[b] < E[g] 


though the departure from linearity was not great enough to be detected 
by a formal test, this effect would still be present. Several situations 
are represented diagramatically in Figure 5. 

Furthermore, the chance inclusion in the sample of an extreme 
deviation which is smaller than expectation at the lower range of X, or 


_ larger than to be expected at the upper range of X, will serve to in- 


crease the estimate of b proportional to the square of the deviation, 
whereas it will increase £ proportional only to the first power of the 


deviation. 
The present data afford a means of obtaining average estimates of 


s7? that are hence less subject to sampling error. For the skeletal 
measurements, four different regressions were computed, each on the 
same independent variate, femur length. If it may be assumed that the 
following scheme, an extension of the previous one, holds approximately, 


Ls Tie aif ls te | 
€2 > Ne < M2 
€3 > 73 < Mz 
€, > 74 < M4 
d >é 


then for each regression fj may be ona from (14). Equation (10) 
provides for estimation of s ES , and sq? may then be computed from (7), 


) 
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using sample variances instead of the population parameters. Hence, 

for each group for which these four regressions are available, four es- 
timates of sq? can be obtained. For the present data, these four esti- 
mates are available for each of the four race-sex groups. The estimates 


are presented in Table IX. 


TABLE IX 


Estimates of sq? in mm? Derived from the Several Regressions 
of Y; on Femur Length 


; bairdi gracilis 


Humerus Length ......... -.1058T 
Skull Hengtha-- eee ees -.2967 
Bkullswidther.cs oeeetee rere: -.2921 


-.0279 
96 


.0167 -0620 -1039 
181-182 180-182 111-112 


No; in:Sample-cco ntsc. « « 


bairdi gracilis 
Unweighted Mean .0394 Unweighted Mean -0380 
Weighted Mean .0394 Weighted Mean -0448 


bairdi plus gracilis 
Unweighted Mean .0387 
Weighted Mean .0407 


T Formal test for linearity was made for this regression. 


The error variance estimates for gracilis, based on approximately 
half the number of individuals, are much less stable than those for 
bairdi. It is of interest, however, to see how well the mean sq?’s of the 
races agree. An extimate of s,’ will be needed later. The nature of 
the use does not require more than that the order of magnitude of the 
variance be known. For this use the total weighted mean will be ac- 
cepted, 


Sq? = .0407 mm?. (15) 
Tests of significance can be used to answer two different questions 


in so far as the difference between regression coefficient for two sam- 
ples is concerned. These questions are: 


ia Nas, 
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a. Could the samples have been drawn from populations which have 
different b’s? or, 


b. Could the samples have been drawn from populations which have 
different B’s? 


The first question can under certain circumstances be answered cor- 
rectly by the conventional tests of significance employed in covariance 
analysis. If it is ascertained that the observed regression coefficients 
are significantly different, than it may be presumed that the populations 
are different. However, it would not be known whether the difference is 
due to their having different 6’s, or different sq”’s, or perhaps both. 
This situation is analogous to the use of the Student-Fisher t test for the 
difference between two sample means. In the event that it is not known 
that the population variances are the same, a significant t indicates 
only that either the means are different, or that the variances are dif- 
ferent, or possibly both. Thus, although the populations may be pre- 
sumed to be different, it cannot be stated that they differ in their means. 

A precaution that must be observed in making and interpreting such 
a test for differences between b’s is that the two samples must have ap- 
proximately the same observed variances of the independent variate, 
and these must be reasonably large. Radical differences between the 
s,”’s can cause two samples from the same population to be invariably 
shown to be significantly different. This is discussed in the next sec- 
tion. 

The second question is the one that has generally been treated in 
the literature. It is also the question that those who have ignored the 
effect of errors in the independent variate have apparently assumed -was 
being answered by their analysis. 

There are two kinds of errors which may be made in any statements 
based on probability analysis. One type comprises the mistaken conclu- 
sions that two samples represent different populations. These are called 
“errors of the first kind.” The second type comprises the errors aris- 
ing from the fact that an existing difference between the populations may 
not have been detected. These are called “errors of the second kind.” 


Errors of the First Kind 


Normally, the errors of the first kind are set by the choosing of the 
confidence level, which is at the option of the investigator. In correct 
application of covariance analysis the chosen level will be the true one. 
However, in testing for differences between regression coefficients in 
cases where the independent variates are subject to error, it cannot al- 
ways be assumed that the chosen confidence level is the one which 
actually applies. ; 

One may legitimately ask this question: If two samples are drawn 
from the same population so that the variances of the independent vari- 
ate are radically different for the two samples, what will be the effect 
on errors of the first kind? The question has considerable practical 
importance, since most of the collections of material similar to that 
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used in this investigation consist of adult specimens only. In such cases 
s,” will be small. If the material used here were compared with such a 
collection, it is not unlikely that the samples would have observed re- 
gression coefficients which would be shown to be significantly different. 
Can these differences be relied upon? 

A completely general answer to the question has not been found, but 
a special case which will cover many situations has been investigated. 
The restrictions placed on the special case are: 


a. Two samples have been drawn from the same population. 

b. The samples have been so drawn that the observed variances of 
the independent variates are different for the two samples. 

c. The numbers in the two samples are equal. 

d. Only the expected values are considered. Sampling fluctuation 
has not been evaluated. 


In spite of the fact that the 6’s are the same for the two samples, the 
presence of errors in the independent variate will in general cause a 
difference between the b’s to exist, and the test of significance will not 
distinguish the cause of the difference. The behavior of this model, 
which is an extension of the preceding one on page 28, is described. 
Sums of squares are used throughout. From equations (7) through 


(9), 


Sx? = Sz? + Sa’, (16) 
Sy? = S,° ae S-?, (17) 
Sxy = Sép- (18) 


It will be recalled that for the model the correlation between the steady 
parts, n and €, as affected by the common cause is unity (equation 6), 
although the observed correlation may be far different. Because of the 
complete determination of the steady parts, and since 


Sen? 
ne Bea 


it follows from (18) and (10-a) that the steady parts may be related by 
Sy? = BS? and 
Sin = BS¢’. 


For any given sample, the ratio of the steady part sum of squares to the 
fluctuation sum of squares is fixed. Define k so that 


S;° = kS,’. (19) 


Then (16) through (18) become 


BRS = 
yes ayes (24) 


— The sum of squares about two separate lines, each one being the 
-- least squares fit for its own sample, is 


4 | Seager : 
4 Vx By ox, tay 


= sep 


From the restrictions placed on this special case, although the expected 
values are 


2 2 2 = 
8,7 48,2, 8,7 #87, § 


#5 and k, #k 
cnt a oe : os 


nevertheless, 


\ 


Using these relations and the definition equation, from (24) it follows 
that 


re ee Sey ey 


B?S,° (2k,k, + k, + k,) 


ins 2 


sep 


(25) 


iw = seers 


4 The sum of squares about the two best fitting parallel lines, each line 
passing through its own sample mean, is 


Gey BSe yt 

4 = 2 2_ 4} 4-2 

E de aes Bi ae eee 
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par 


By substitutions similar to those used above, one finds 


The significance of the difference between the regression coefficients is : 
measured by the variance ratio, F,. which is always ed es 


ll 


Sy.x/1 at 
3 BETS : 8) 
F= Sy.x/(N- 4) . This becomes (28 ) 
sep 


es B'S q? (ky - k,)? (N - 4) 
Fo dak, + 228.7, + Ik, + 1) + 670K, +k, +k] 


Since 8, S,?, and S,? are properties of the population sampled, they 
may be collected into a single constant. Thus 


Per (k, - k,)? ‘ 
N-4) — 28 2 
(k, + k, + 2) Ls agetks + 1)(k, + 1) + (2k,k, +k, + k,) | 


and since from (13), Sg?/Se? = x, 


oe (k, - k,)? (29) 
(N-4) (cai eceed) 2(k, + Dik, PG) S: 1 67(2k,k, +k, + #)) 


ee 


AB 
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Now let 6? = Q. Then 


(N-4) ~ (k, +k, + 2) [2(k, + D(k, + 1) + QQ@k,k, + k, + k,)] 


The changes in F/(N-4) must be studied as k, and k, assume different : 
values. Let a 


Ri=k/kee 0 <R <1 (31) 


Equation (30) then becomes 


an Qk? (1 - R? 
Wa)" SPRANG) + kT ReNQ2) + RCV] + RR Garey ea (32) 
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Equation (32) is cumbersome because of the three variables involved. 
However, it can be seen directly from (32) that as Q approaches zero, 


F/(n - 4) also approaches zero. Now if (32) be divided by Q, it be- 
comes 


Now as Q becomes very large, 


F a k,?(1 - R}? 
(N-4) © kS[2RR+1)] + k,7[ (R241) + OR] + k,[2(R+1)] * 


The numerical value of F/(N - 4) thus does not approach any fixed limit 
independent of k, and R as Q increases. 

From (32) it can be seen directly that as R approaches unity, 
F/(N - 4) approaches zero. By letting R approach zero, it follows that 


F 


> Qk,’ 
(N-4) ~ k,?(Q+2) + k,(2Q+6) +4’ 


which again is some finite value dependent on the other variables. 
If both numerator and denominator of (32) are divided by k,?, it 
becomes 


= Q(1 - R?? 
4) 1, [OR R+1)(Q+1)] + (R741)(Q+2) + AR(GQr4) + RHVGAB) | 4 
WN-4) "4. [OR R+1)(Qe+1)] + R71)Q+2) + RGQ+4) + RVG) 4 


ky i 


Now for any nonzero value of Q, as k, approaches zero the numerator 
remains fixed, and the denominator (due to the last two terms) becomes 
very large. Hence F/(N - 4) approaches zero. If, on the other hand, k, 
becomes very large, the numerator again remains fixed, while the de- 
nominator (due to the first term) again becomes very large. Hence 
F/(N - 4) again approaches zero. Positive values of F/(N - 4) will lie 
between the limits for k, of zero and plus infinity. 

Since F/(N - 4) approaches zero as k, varies in two directions, it 
is desirable to determine what values of k, make F/(N - 4) maximum. 
To determine this, the partial derivative of F/(N - 4) may be taken with 
respect to k,, equated to zero, and for any selected Q and R the solu- 
tion gives the value of k, for which F/(N - 4) is maximum. This is 


aF/(N-4) _ -k,°[2R(R+1)(Q+1)] + k,[(R+1)(2Q+6)] + 8 _ Se KER) 


dk, denominator 


Since the equation is set equal to zero, the denominator need not be 
computed. 
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It is desirable to have practical limits to Q to use in evaluating 
(33). If the independent variate has been chosen so that its range of 
variation is greater than that of any of the dependent variates used, the 
absolute magnitude of f will in general be less than unity. A more 
liberal limit is to suppose -2 < 8 < +2. If it may be assumed that 
0<r <+4, then 0 < Q < +16, since from (29), Q = Ag”. R previously 
has been bounded by definition, 0 <R < 1 (30). Solution of (33) for var- 
ious combinations of Q and R indicates that F/(N - 4) is always maxi- 
mum between k, = 1 and k, = 6. It approaches the larger number only 
for combinations of very small Q and small R, i.e., Q< .05, R<.1. 
For most combinations the maximum lies in the vicinity of k, = 1. 

Thus, to minimize the spurious component in F, the choice lies in 
choosing k, either very small or else of some reasonable magnitude, 
say k, > 50. It must be remembered that the independent variate need 
not be randomly selected, only that the sampling for the dependent var- 
iate be random. Hence, within limits it is possible to choose whatever 
k, is desirable. In general, the greater s,? is made, the greater the 
k’s will be. Small k’s could be chosen, but this would for most applica- 
tions severely limit the sample size. The choice of large k’s thus seems 
to be the only reasonable answer. 

As a further aid in visualizing (32), a family of curves for selected 
k,’s and Q’s with R treated-as a continuous variable is presented in 


Q=10, K,=50 
Q=l, K,=50 
Q=10, K,=100 
Q=1I, K,=100 
Q= 1/10, K=5 
Q=1/10, K=10 
Q=1/10, K,=50 
= = ——_ Q= 1/10, _K=100 
Stes 2 44 8 


RATIO, K2/K, 


FIG. 6. Solution of (32) for selected k,’s and Q’s. Curves were drawn through 
computed points at R = 1/2, 1/4, and 1/8 with a French curve. Expected value of 
F/(N - 4) may be interpreted as a spurious component of F in the test of signifi- 
cance for regression coefficient differences, due to the presence of errors in the 
independent variate. Horizontal lines indicate the level F/(N - 4) must attain to 
reach the 5 per cent level of significance for the indicated combined sample sizes. 
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Figure 6. It will be noted that very small k,’s are not represented in 
the figure. Such lines would in general be similar to the curves for the 
large k,’s. : 

It can be seen, as described above, that when the ratio k,/k, is 
unity, there is no spurious contribution to F, independent of whatever 
Q and k, may be. However, the greater the difference between k, and 
k,, the greater the spurious contribution, though the magnitude of the 
spurious component is also a function of the other variables. If the 
total variance of the independent variate, s,?, is large relative to the 
error variance component, sg”, then 


2 2 ky = 
By ky 


Hence, if the observed variances of the independent variates are approx- 
imately equal, it can be seen from Figure 6 that there will be no appre- 
ciable spurious component in F and, further, that the coefficient of risk 
will not have been changed to any important degree. If, on the other 
hand, there is a considerable difference between the observed variances 
S,." and Sy)» then it is very likely that R will be small. In such cases 


it will be important to consider the effect Q and k, will have on the co- 
efficient of risk. If either Q is small or k, is large, the effect will 
tend to be minimized. But association of a large Q with a small k, will 
lead to a false indication of significance when R is small. The horizon- 
tal lines in Figure 6 indicate the level to which F/(N - 4) must rise in 
order to reach the 5 per cent level of significance for different combined 
sample sizes. 

The reliability of the present tests of significance can be evaluated 
by employing Figure 6. For the skeletal comparisons on the two races 
of Peromyscus, estimates of k, based on equations (10, 14, 15, 16) 
range from about 70 to 110. Q is always low, 


03< Q< .55, (34) 


and the extreme R’s are .64 and .90. If these be entered in Figure 6, it 
will be seen that the most extreme combination gives an F that is far 
from reaching the 5 per cent significance level for N = 400, a number 
close to the maximum used here. 

It thus seems reasonably certain that errors in the independent 
variate have in no way affected the conclusions drawn about regression 
coefficient differences in the comparisons made for the skeletal mea- 
surements between races and between sexes. 

An estimate of the results of comparing the present material with 
collections containing only adults can also be made. Dice (1932 and sub- 
sequent papers) found that the variances of femur length for numerous 
races and local populations of Peromyscus maniculatus range from 
about 0.4 to 0.8 square millimeters. His measurement of femur length 
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was slightly different from the one used here, but this should have no 
appreciable effect on the variances. The mice were of the one-year age 
class, including in general those mice which would likely be classed in 
museum collections as adults. Using the estimated sq? of .04 (equation 
15), the estimated k’s for Dice’s samples range from 10.to 20. The k’s 
estimated for the present data range from 70 to 110 and, being larger, 
would be identified with k, in application of formula (31). Hence the 
values of R in the formula should vary between .29 and .09. Accepting 
the Q’s estimated here, ranging from .03 and .55, the spurious contri- 
bution to the variance ratio may be estimated from Figure 6 or may be 
computed directly from (32). It is likely with a combined sample size 
of 400 or greater that such differences may be indicated when they do 
not exist, although most combinations of Q, R, and k, would not yield an 
expected value of F that in itself would be significant. It must be re- 
membered, however, that the model used here does not evaluate sampl- 
ing fluctuation in the determination of F. Even though the expected value 
of the spurious component is not itself significant, its presence will 
serve to change somewhat the magnitude of errors of the first kind. It 
would seem in comparing samples of the present sort with adults only, 
that if the R values are less than about .3 or .2, considerable caution 
should be used in the interpretation of the results. 

Evaluation of the effect on external comparisons has not been made. 
In general, however, the differences between the races are of such mag- 
nitude that statistical analysis of any kind is hardly needed. More im- 
portant is the case of comparisons of regressions among sibships. 
These, it will be recalled, include from four to six groups in any com- 
parison. The examination of the effect of errors in the independent var- 
iate has been worked out only for comparisons involving two groups. 
Pending further analysis of this problem, the differences that appear to 
occur among sibships must be considered tentative, since some of the 
k, values are small. 

An investigation of the effect of errors on the test of significance for 
adjusted mean differences was not successful. It would appear, how- 
ever, that errors in the independent variate have less effect on the test 
for adjusted mean differences than they do on the test for differences 
between regression coefficients. 


Errors of the Second Kind 


One may wish to inquire how likely a test of Significance will be to 
detect a real difference between regression coefficients as compared 
to the likelihood under some alternative situation. Failure to detect a 
difference when it exists is an error of the second kind. An examina- 
tion of the effect of alternative situations on the variance ratio, F, will 
give some insight into the problem. As in the preceding section, a 
general formulation has not been made, and recourse is had to a special 
case. The specific question asked is: how large is F for differences 
between two regression coefficients, when the independent variates are 
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kSz°(B, = Ba) (N = 4) 


poeta ee evs REAM Pr etlat SAE Saree (36) 
Fog #0) 2[(k + 1)(Se? + Se?) + kS47(BT + B2)] 


Now define P to be the ratio of (36) to (35). Then 0< P<1. P will 
measure the relative size of the F’s and hence give some idea how likely 
the test would be to distinguish a difference that existed between regres- 
sion coefficients when errors were present in X, relative to the likeli- 
hood of detecting the difference if there were no errors in X. Upon 
simplifying, the expression for P becomes 


S.7 + S.? 
sia k+l : oe 
(<4=) Ge? + Se2) + Su%(B? + B3 


From (37) it can be seen that k has considerable importance if it is 
small, but in general has somewhat less effect than it has on errors of 
the first kind. 

As an approximation to P that is obtainable almost upon inspection 
and is useful if k is large, the limit of P as k approaches infinity may 
be taken. 
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Now if S.?=8,? 


1 
i ie Sa pe Macey 
d ~ 
<* 1+ 55-3 (6i + 82) 
lin 2 = - (38) 
k > 0 2 + 2(B? + 8?) 


The numerical value of (67 + 63) as used in (38) will be approxi- 
mately twice Q computed in the previous section. Thus, from (34), 
(Bj + 3) ranges from about .06 to 1.10. This would indicate that the 
presence of errors in the independent variate for the regressions of 
skeletal measurements has reduced the several variance ratios between 
approximately 3 per cent and 35 per cent in the tests employed here. 
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While it may seem to be only of academic interest to answer the 
question asked, it nevertheless can furnish some information. If the 
only information gained is the realization of the fact that the association 
of large \’s with large regression coefficients makes differences be- 
tween the coefficients more difficult to detect, evaluation of the results 
can be improved. 

Another question, however, may be of more interest. It has been 
stated here that it is desirable for several reasons to collect material 
that covers as great a size range as possible. It has also been men- 
tioned that present techniques of collecting wild material result in some 
part of the collected material not being preserved, hence reducing the 
size of the samples available for comparisons. It may be asked what 
effect the saving of young specimens will have on the ability of covari- 
ance analysis to detect differences between regression coefficients. 

From (19) and (36) it may be seen that if Sz” be increased by some 
factor, k is increased by the same factor and if k is large enough so 
that (k + 1)/k = 1, then the same factor is applied to F. As cited earlier, 
Dice (1932 and subsequent papers) found that the variances of femur 
length for many samples of adult Peromyscus ranged from about 0.4 to 
0.8 mm?. The variances of the femurs for the samples used in the pres- 
ent investigation ranged from about 2.8 to 4.3 mm?, or were larger than 
those for Dice’s samples, on the average, by a factor of about five. 
Dice’s material is probably roughly comparable to typical museum 
samples as regards these variances. The order of magnitude of s,? 
estimated here indicates that Sx, */ Sx,” = St,?/S¢,” for comparing the 
present samples with museum ones. 

If it be further assumed that about half of the specimens collected 
in the field are discarded, so that the size of the sample could be 
doubled, the effect of saving younger specimens on the magnitude of F 
for testing differences between regression coefficients can be esti- 
mated. From (36) it can be seen that if S be increased by about five 
times, and N be doubled, then F would be increased by a factor of 
about ten. Clearly this would be a worthwhile improvement in tech- 
nique! 

A study of the effect of errors in the independent variate on the 
test for adjusted mean differences, as regards errors of the second 
kind, did not result in any exact formulation, even for special cases. 
However, for what it may be worth, certain features that may be per- 
ceived intuitively should be mentioned. First, it appears that the larger 
the variances of the independent variates, the more likely that a signifi- 
cant difference between adjusted means will be detected if it exists. 
Second, if xX, = X,, the test is essentially a function of S.?’s, and not of 
the S,,?’s or the Sq?’s. Third, in comparing samples of adults only when 
X, and X, differ considerably, it is likely that no adjusted mean differ- 
ences will be detected, even though the mean differences are large. 
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Several factors which might invalidate the conclusions drawn from 
the application of covariance analysis to measurement data have been 
shown to be unimportant in the present investigation, at least for com- 
parisons between sexes and between races for the regressions of the 
skeletal measurements. Factors other than the presence of errors in 
the independent variate seem unlikely to affect the conclusions drawn 
from any of the comparisons made. 

The regressions can be tested for departures from linearity and 
for lack of homogeneity of the error variances about regression lines. 
The present data are linear and have homogeneous variances. Log- 
arithmic transformation is not necessary for these data and, in fact, 
would be improper. The ages of the specimens need not be known, so 
long as the data are collected over some reasonably large range of 
ages. The amount of error in the independent variate has been esti- 
mated for those regressions dealing with skeletal measurements. These 
errors are shown to have negligible effect on the coefficient of risk 
when testing for differences between regression coefficients, again if 
the specimens cover a large range of ages. The test is somewhat less 
likely to detect differences between regression coefficients than it would 
be if no errors were present in the independent variate. For detecting 
differences, mouse measurement data for field-caught specimens col- 
lected over the age range from weaning time to full adulthood is esti- 
mated to be about ten times more efficient than data collected for adults 
only, which is the present practice. All conclusions clearly indicate the 
need for collecting the preserving specimens of all ages, rather than 
adults only. 


COMPARISON OF SAMPLES 


In the preceding section it was shown that under certain conditions 
the application of covariance analysis to measurement data will yield 
conclusions which may be relied upon. These necessary conditions have 
been fulfilled for comparisons between sexes and between races of re- 
gressions for skeletal measurements. For comparisons between races 
the differences between regression coefficients and/or adjusted means 
for external measurements are great enough so that no concern about 
the validity of the conclusions regarding significance need be felt. The 
differences indicated among sibships, however, still need further evalu- 
ation before definite conclusions can be drawn. This is true because 
several of the sibships did not meet the important criterion of covering 
a wide range of sizes. Even in these cases it would appear that differ- 
ences among adjusted means are reliable. 


» 
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Comparison between Sexes 


It would be of some interest if a consistent pattern of differences 
between regression coefficients and between adjusted means was brought 
out by the analysis, as summarized in Table I. Such was not the case, 


however, for sexual differences. For bairdi, the coefficients for re- 


gressions of humerus length and skull length on femur length were 
significantly different between sexes at the 1 per cent level. Adjusted 
mean differences between sexes were significant at the 1 per cent level 
for regressions of skull width and skull depth on femur length. Adjusted 
mean differences between sexes for regressions of tail length and hind- 
foot length on body length were significant at the 5 per cent level. No 
significant differences existed between sexes for the regression of ear 
length on body length. 

For gracilis, at least in part because sample size was smaller, 
fewer differences could be detected. The regression coefficients of 
humerus length and skull width on femur length were different between 


“sexes at the 5 per cent level. A sexual difference in regression coef- 


ficients, significant at the 1 per cent level, was found for the regression 
of hind-foot length on body length. An adjusted mean difference between 
sexes in the regression of ear length on body length was significant at 
the 5 per cent level. No differences were detected between sexes for 
regressions of skull length and skull depth on femur length, or for re- 
gression of tail length on body length. Thus the only agreement between 
the races in the type of sexual difference is in the regression of humerus 
length on femur length, and here the level of significance is different. 

For both bairdi and gracilis the males have a larger regression co- 
efficient for every regression than do the females (Table I), although it 
is not significantly so except in four of the fourteen regressions. This 
may be indicative of the findings yet to be derived from more extensive 
data. If it be assumed that the probability of the males having a larger 
coefficient than the females is one-half and if the several regressions 
may be considered independent, then the probability of the males being 
greater in all fourteen cases is only 1/16384. The independence as- 
sumption cannot really be justified, but nevertheless, the complete as- 
sociation of larger regression coefficients with the males provides an 
interesting topic for further investigation. 

Because of the lack of significance between sexes for several of 
the regression coefficients, the adjusted means for these regressions 
were computed. In bairdi four of the five adjusted mean differences 
between sexes were significant, and in all five cases, the male had the 
greater mean. In gracilis, however, the males had greater means in 
only two, neither significant; the females had the greater means in three 
cases, two of them significant. Interpretation of adjusted mean differ- 
ences is more difficult than interpretation of differences between regres- 
sion coefficients. A slight difference between coefficients, if the lines 
intersect at some distance from the sample means, will appear as an 
adjusted mean difference. On the other hand, a greater difference be- 
tween coefficients, if the lines intersect close to the means, might go 
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undetected as either type of difference. Hence, definite conclusions 
about sexual differences between adjusted means do not seem possible 
from the present data. 


Comparisons between Races 


Nearly complete consistency within sexes was found in comparisons 
of the regressions between races (Table I). Differences in regression 
coefficients were significant at the 1 per cent level (in both sexes) be- 
tween bairdi and gracilis for regressions of skull width and skull depth 
on femur length and for tail length on body length. In addition, the re- 
gression of ear length on body length differed between the races at the 
1 per cent level of significance for the females and at the 5 per cent 
level for the males. No differences between the races were found in 
the coefficients for regressions of humerus length, but adjusted mean 
differences for these two regressions were different at the 1 per cent 
level in both sexes. The only inconsistency involves the regression of 
skull length on femur length. For this, the males differed in their re- 
gression coefficients at the 5 per cent level, while the females did not, 
but instead differed in means at the 1 per cent level. The estimated 
regression coefficient for the male bairdi, .89 mm/mm, seems abnor- 
mally high relative to the other three groups. The bairdi females and 
both sexes of gracilis have coefficients of .82 mm/mm. No explanation 
for this phenomenon can be offered at present, but its effect on the type 
of difference detected is evident. Reference to Figure 7 will show that 
for the males, even in the absence of any difference in regression co- 
efficients, there would quite obviously be a difference in adjusted means, 
as there is for the females. 

In comparing the races for body proportions and absolute size, it is 
clear that neither race is consistently larger, although gracilis usually 
is. For the regression of humerus length on femur length the adjusted 
mean difference was about one-quarter of a millimeter in both sexes, 
with gracilis the larger. For the regression of skull length on femur 
length, the male bairdi had a much larger coefficient, as mentioned 
above, than the other three groups, which led to a significant difference 
between the races when the males but not when the females were com- 
pared. The races differed in adjusted means within females by a little 
over a millimeter, with gracilis the larger. Reference to Figure 7 will 
show that a mean difference of about the same amount exists between 
the males of the two races, if the effect of the differences in regression 
coefficients is ignored. 

Comparison of regressions of skull width and skull depth on femur 
length reveals an interesting difference. Within either sex the bairdi 
samples showed a greater coefficient for both regressions and all the 
differences were significant at the 1 per cent level. Hence, adjusted 
mean differences were not computed. However, inspection of Figure 7 
will show that the absolute width of the skull is greater in gracilis than 
in bairdi, if the two races are compared at the same femur length. Yet 
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FIG. 7. Regression lines for skeletal measurements of bairdi and gracilis. 
Separate lines are drawn for each sex. Femur length is the independent variate for 
all regressions. Both axes are to the same scale for all regressions, but there has 
been some vertical displacement along the Y-axis so that all lines could be drawn on 
one page. Projection of the regression lines onto the X-axis will give the size range 


covered by each sample. 


the larger the bairdi mice become, as measured by femur length, the 
more nearly their skull width approaches that of gracilis. Skull depth, 
on the other hand, is greater in bairdi than in gracilis and the larger 
the mice become, as measured by femur length, the greater the excess 
of bairdi skull depth. This is the only measurement investigated in 
which, except for very young mice, bairdi is absolutely larger than 
gracilis. 

For the regressions of tail length and ear length on body length, the 
regression coefficients are significantly different between the races in 
both sexes, and gracilis has the larger ones in all cases. For the re- 
gression of hind-foot length on body length, in neither sex could a 


48 WILLIAM B. McINTOSH C. Loaves; 


difference between races be detected in the coefficients. However, 
gracilis has a greater adjusted mean for this regression in both sexes, 
and the difference is significant. Inspection of Figure 8 will indicate 
that real differences in means exist for all three regressions of eX- 
ternal measurements, whether or not they are due to differences in the 
various regression coefficients, and that gracilis is always the larger. 
The effect of the difference in regression coefficients is particularly 
pronounced for tail length; the larger the mice become, the greater the 


difference in tail length. 
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FIG. 8. Regression lines for external measurements of bairdi and gracilis. 
Separate lines are drawn for each sex. Body length is the independent variate for all 
regressions. Both axes are to the same scale for all regressions, but there has been 
some vertical displacement along the Y-axis so that all lines could be drawn on one 
page. Projection of the regression lines onto the X-axis will give the size range 
covered by each sample. 
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Comparison between Sibships 


Within bairdi sibships, analysis indicated that significant variation 
occurs for all regressions. In gracilis, differences were not found for 
the regressions of skull width on femur length in either sex or for hu- 
merus length on femur length inthe males, 


APPLICATION OF RESULTS 


The fact that for the deermice used in this study considerable 
progress in analysis can be made without needing to know anything about 
the ages of the specimens, indicates that investigation of other small 
mammals from the same viewpoint would be desirable. If the factor of 
age should prove to be of general unimportance, more efficient use of 
the potentially available field-caught material would be possible. In the 
laboratory considerable savings could also be made. For example, 
about half of the mice in any group could be prepared as specimens at 
an early age, perhaps at thirty days. The remainder could be kept until 
they were nearly fully grown, a time that for these purposes need not be 
more than six to eight months. These latter animals could be used as 
breeders, if such were the requirements of the particular experiment. 
Cage space and food requirements, as well as the necessary labor of 
caring for the caged mice would be approximately halved. The results 
should be directly comparable to wild populations in the locality where 
the stocks were taken, since most small mammals do not survive long 
under natural conditions, often much less than one year. 

The morphological differences demonstrated between the two races 
of deermice indicate that there are numerous genetic differences be- 
tween the two populations. It will be recalled that the existence of some 
of the morphological differences was suspected before the analysis was 
undertaken. It seemed likely that the relative length of the tail, hind- 
foot, and ear in gracilis was greater than in bairdi. Of course, these 
apparent differences could have been due to differences in adult size, 
but this was in no instance found to be the case. That skull length was 
relatively longer in gracilis than in bairdi was also strongly suspected. 
There was no way of knowing from using the usual simple ratio method 
what the relative values would be with respect to humerus length, skull 
width, and skull depth. The present analyses demonstrate, however, 
that the first two measurements are relatively longer in gracilis, and 
the last, skull depth, is relatively greater in bairdi. 

Findings are in agreement with the interpretation placed by taxono- 
mists on the relationships between bairdi and gracilis. According to 
Osgood (1909) the race bairdi is supposed to intergrade to the west with 
nebrascensis, which in turn intergrades to the northwest with the race 
osgoodi. The subspecies osgoodi intergrades to the north with borealis, 
which in turn intergrades to the east with maniculatus. The subspecies 
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maniculatus finally intergrades to the east with gracilis. Since four 
distinguishable races are presumed to form intermediate links in the 
chain between bairdi and gracilis, it is not surprising that all the dif- 
ferences between bairdi and gracilis are significant. Of the fourteen 
comparisons made between kinds, seven within each sex, twelve were 
significantly different at the 1 per cent level, and the remaining two 
were significantly different at the 5 per cent level (see Table I). Note, 
furthermore, that bairdi belongs to the group that has been informally 
recognized as the “southern” group, mostly prairie-dwelling races, 
which includes primarily small-bodied, short-tailed, short-eared forms, 
whereas gracilis is typical of the “northern” group, primarily forest- 
inhabiting races, with larger bodies, and longer tails, hind feet, and 
ears. 

None of the differences detected was merely a function of differ- 
ence in general size of the races. This discovery is in agreement with 
the previous statement that a large number of genetic differences in- 
tervene between bairdi and gracilis. In addition to the genetic factors 
controlling adult size, there must be other genetic differences that af- 
fect the average rates at which the various parts grow relative to one 
another. The possibility of other proportional differences due solely to 
adult size differences, however, has not been excluded. 

The amount of sexual difference is considerably more than has 
been previously known for Peromyscus maniculatus. Sumner (1924), 
Dice (1932, 1933, 1937), and Murie (1933) had reported that the hind 
foot of various races of the species was longer in the males, but had 
been unable to detect any consistent difference in the other measure- 
ments. That sexual differences in body proportions should exist is not 
surprising, its discovery here merely emphasizes the sensitivity of co- 
variance analysis in detecting relatively minor differences. 

The intersibship variability within a race poses interesting ques- 
tions which, unfortunately, cannot be adequately answered for the two 
races by the present data. Further data and more analyses are needed 
in order to ascertain the amount of intersibship variability and the ex- 
tent of its occurrence. The answers to these questions, when obtained, 
will be helpful in providing further clues to the mode of evolution in the 
species concerned. The genus Peromyscus is known chiefly from the 
Pleistocene and Recent epochs, and thus has a very short known history. 
It may reasonably be expected that further evolution is still taking place. 
The presence of intersibship variability indicates that this is possible. 
Variation within a sibship may either be ascribable to purely nongenetic 
factors or to a combination of.nongenetic factors and very minor genetic 
ones. Under neither of these assumptions would rapid evolution be pos- 
sible. But if genetic variation among sibships is common, the potential- 
ities for rapid differentiation are considerable. Since these mice were 
reared under rather constant environmental conditions, it may be 


*The names used by Osgood have been changed to conform with current practice. 
These changes are: luteus of Osgood = nebrascensis; nebrascenis of Osgood = 
osgoodi; arcticus of Osgood = borealis. 
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presumed that at least a large part of the variability is genetic in ori- 
gin. A critical evaluation of several genetic models to discover which 
ones lead to greater intersibship variation would be likely to elucidate 
some of the problems connected with the blending type of inheritance 
which proportional differences usually exhibit. It is probable that the 
fewer the number of genes involved in producing a given difference be- 
tween populations, the more likely one will be to discover such differ- 
ences. Hence, relatively few genes may have determined the variation 
in each of the proportions studied. 


SUMMARY 


The method of covariance analysis has been applied to comparison 
of certain body and skeletal measurements of two races of the deer- 
mouse, Peromyscus maniculatus bairdi and P.m. gracilis, respectively. 
The mice were born and reared in the laboratory. 

An evaluation of the applicability of covariance analysis to this type 
of measurement data has been made. It is shown by application of a 
formal test for linearity to the regressions that the customary practice 
of employing a logarithmic transformation to the measurements was 
not needed here and, in fact, would have been undesirable. An applica- 
tion of Bartlett’s test for homogeneity of variances is suggested for 
testing homogeneity of error variances about a regression line. For 
the present data it indicated that the error variances are homogeneous. 
The inadequacy of simple ratios as a substitute for covariance analysis 
is emphasized. For linear regressions it is shown that the ages of the 
specimens need not be known. The variances of errors in the inde- 
pendent variates have been estimated. It is demonstrated that for com- 
parisons of two samples these errors in the present data are not large 
enough to appreciably bias the coefficient of risk. They may, however, 
be important in comparisons of measurements for adults only. The 
errors reduce somewhat the ability of the analysis to detect differences 
between samples. 

Regressions of humerus length, skull length, skull width, and skull 
depth, all on femur length, and regressions of tail length, hind-foot 
length, and ear length, all on body length have been studied. Significant 
differences occur between the sexes within each race in most regres- 
sions. There is some evidence that males may generally have larger 
regression coefficients than females. Thus, it is necessary to make 
comparisons between races and among sibships separately for each sex. 
For all measurements except skull depth gracilis was larger than 
bairdi. In no comparison between races could a single regression line 
be fitted to both races. Thus, the races differed significantly in each 
regression and in no instance was the difference merely due to a dif- 
ference in total size. Analysis of differences between sibships is based 
on fewer individuals, yet variation among sibships is significant for 
most of the regressions. 
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Intersibship variation indicates that the species is still capable of 
rapid evolution. Agreement of the present results with the usual tax- 


onomic interpretations is excellent. 
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